Cross-lingual Synonymy Overlap

نویسندگان

  • Anca Dinu
  • Liviu P. Dinu
  • Ana Sabina Uban
چکیده

We investigate in this paper the degree of overlap between synonym sets of translated word pairs across three languages: French, English and Romanian. We use for this purpose a French Synonym Dictionary, a Romanian Synonym Dictionary, Princeton’s WordNet and Google Translate API. We build a database containing pairs of (translated) words from the three languages, along with their corresponding synonym sets. We use it in order to gain insight into the synonym overlap for each language pair, and thus, into their degree of common concept lexicalization, by various queries. While the overall percentage of common synonyms is (expectedly) quite small (averaging ~6% across all language pairs), the percentage of hard synonyms pairs (pairs that have at least one common synonym), reaching ~62%, is significant. This is encouraging for further use of this special kind of word translated pairs in tasks such as automatic enhancement of lexical databases (such as WordNet) for less resourced languages such as Romanian, based on corresponding English versions of these lexical databases. Another interesting query topic was obtaining distributions of hard synonym pairs, function of their part of speech: hard synonyms were most frequent among verbs for English, and among adjectives for Romanian and French.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

OWNS: Cross-lingual Word Sense Disambiguation Using Weighted Overlap Counts and Wordnet Based Similarity Measures

We report here our work on English French Cross-lingual Word Sense Disambiguation where the task is to find the best French translation for a target English word depending on the context in which it is used. Our approach relies on identifying the nearest neighbors of the test sentence from the training data using a pairwise similarity measure. The proposed measure finds the affinity between two...

متن کامل

A Subspace Learning Framework for Cross-Lingual Sentiment Classification with Partial Parallel Data

Cross-lingual sentiment classification aims to automatically predict sentiment polarity (e.g., positive or negative) of data in a label-scarce target language by exploiting labeled data from a label-rich language. The fundamental challenge of cross-lingual learning stems from a lack of overlap between the feature spaces of the source language data and that of the target language data. To addres...

متن کامل

Bilinguals activate words from both languages when listening to spoken sentences: Evidence from an ERP-study

The current study examines whether bilingual word recognition in spoken sentences is influenced by cross-lingual phonological similarity. ERPs were measured while GermanEnglish bilinguals listened to German sentences. Target words in the sentences were either German-English homophones (e.g., eagle – Igel ‘hedgehog’), German words that were phonologically closely related to English words (e.g., ...

متن کامل

Overview of the Cross-lingual Expert Search (CriES) Pilot Challenge

This paper provides an overview of the cross-lingual expert search pilot challenge as part of the cross-lingual expert search (CriES) workshop collocated with the CLEF 2010 conference. We present a detailed description of the dataset used in the challenge. This dataset is a subset of an official crawl of Yahoo! Answers published in the context of the Yahoo! Webscope program. Further we describe...

متن کامل

Translation Equivalence and Synonymy: Preserving the Synsets in Cross-lingual Wordnets

The Princeton WordNet for English was founded on the synonymy relation, and multilingual wordnets are primarily developed by creating equivalent synsets in the respective languages. The process would often rely on translation equivalents obtained from existing bilingual dictionaries. This paper discusses some observations from the Chinese Open Wordnet, especially from the adjective subnet, to i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015